transformer-based framework
POIFormer: A Transformer-Based Framework for Accurate and Scalable Point-of-Interest Attribution
Saxena, Nripsuta Ani, Hsu, Shang-Ling, Shetty, Mehul, Alkhadra, Omar, Shahabi, Cyrus, Horn, Abigail L.
Accurately attributing user visits to specific Points of Interest (POIs) is a foundational task for mobility analytics, personalized services, marketing and urban planning. However, POI attribution remains challenging due to GPS inaccuracies, typically ranging from 2 to 20 meters in real-world settings, and the high spatial density of POIs in urban environments, where multiple venues can coexist within a small radius (e.g., over 50 POIs within a 100-meter radius in dense city centers). Relying on proximity is therefore often insufficient for determining which POI was actually visited. We introduce \textsf{POIFormer}, a novel Transformer-based framework for accurate and efficient POI attribution. Unlike prior approaches that rely on limited spatiotemporal, contextual, or behavioral features, \textsf{POIFormer} jointly models a rich set of signals, including spatial proximity, visit timing and duration, contextual features from POI semantics, and behavioral features from user mobility and aggregated crowd behavior patterns--using the Transformer's self-attention mechanism to jointly model complex interactions across these dimensions. By leveraging the Transformer to model a user's past and future visits (with the current visit masked) and incorporating crowd-level behavioral patterns through pre-computed KDEs, \textsf{POIFormer} enables accurate, efficient attribution in large, noisy mobility datasets. Its architecture supports generalization across diverse data sources and geographic contexts while avoiding reliance on hard-to-access or unavailable data layers, making it practical for real-world deployment. Extensive experiments on real-world mobility datasets demonstrate significant improvements over existing baselines, particularly in challenging real-world settings characterized by spatial noise and dense POI clustering.
A Transformer-Based Framework for Greek Sign Language Production using Extended Skeletal Motion Representations
Pratikaki, Chrysa, Filntisis, Panagiotis, Katsamanis, Athanasios, Roussos, Anastasios, Maragos, Petros
Building on To address communication barriers between the DHH (Deaf and insights from previous research, we propose a deep learning model Hard-of-Hearing) and the hearing communities, the field of Sign for Sign Language Production (SLP), which to our knowledge is Language Processing has emerged at the intersection of linguistics, the first attempt on Greek SLP. We tackle this task by utilizing a computer vision, and machine learning. Sign Language Processing transformer-based architecture that enables the translation from encompasses a variety of tasks aimed at bridging the gap between text input to human pose keypoints, and the opposite. We evaluate DHH and hearing communities by enabling the automatic translation, the effectiveness of the proposed pipeline on the Greek SL dataset and generation of sign language. The most critical components Elementary23, through a series of comparative analyses and ablation of an effective sign language system are Sign Language Translation studies. Our pipeline's components, which include data-driven (SLT), and Sign Language Production (SLP). In this paper, we gloss generation, training through video to text translation and a primarily focus on Sign Language Production (SLP).
A Full Transformer-based Framework for Automatic Pain Estimation using Videos
Gkikas, Stefanos, Tsiknakis, Manolis
The automatic estimation of pain is essential in designing an optimal pain management system offering reliable assessment and reducing the suffering of patients. In this study, we present a novel full transformer-based framework consisting of a Transformer in Transformer (TNT) model and a Transformer leveraging cross-attention and self-attention blocks. Elaborating on videos from the BioVid database, we demonstrate state-of-the-art performances, showing the efficacy, efficiency, and generalization capability across all the primary pain estimation tasks.
PINNsFormer: A Transformer-Based Framework For Physics-Informed Neural Networks
Zhao, Zhiyuan, Ding, Xueying, Prakash, B. Aditya
Physics-Informed Neural Networks (PINNs) have emerged as a promising deep learning framework for approximating numerical solutions to partial differential equations (PDEs). However, conventional PINNs, relying on multilayer perceptrons (MLP), neglect the crucial temporal dependencies inherent in practical physics systems and thus fail to propagate the initial condition constraints globally and accurately capture the true solutions under various scenarios. In this paper, we introduce a novel Transformer-based framework, termed PINNsFormer, designed to address this limitation. PINNsFormer can accurately approximate PDE solutions by utilizing multi-head attention mechanisms to capture temporal dependencies. PINNsFormer transforms point-wise inputs into pseudo sequences and replaces point-wise PINNs loss with a sequential loss. Additionally, it incorporates a novel activation function, Wavelet, which anticipates Fourier decomposition through deep neural networks. Empirical results demonstrate that PINNsFormer achieves superior generalization ability and accuracy across various scenarios, including PINNs failure modes and high-dimensional PDEs. Moreover, PINNsFormer offers flexibility in integrating existing learning schemes for PINNs, further enhancing its performance.
A Transformer-based Framework For Multi-variate Time Series: A Remaining Useful Life Prediction Use Case
Ogunfowora, Oluwaseyi, Najjaran, Homayoun
In recent times, Large Language Models (LLMs) have captured a global spotlight and revolutionized the field of Natural Language Processing. One of the factors attributed to the effectiveness of LLMs is the model architecture used for training, transformers. Transformer models excel at capturing contextual features in sequential data since time series data are sequential, transformer models can be leveraged for more efficient time series data prediction. The field of prognostics is vital to system health management and proper maintenance planning. A reliable estimation of the remaining useful life (RUL) of machines holds the potential for substantial cost savings. This includes avoiding abrupt machine failures, maximizing equipment usage, and serving as a decision support system (DSS). This work proposed an encoder-transformer architecture-based framework for multivariate time series prediction for a prognostics use case. We validated the effectiveness of the proposed framework on all four sets of the C-MAPPS benchmark dataset for the remaining useful life prediction task. To effectively transfer the knowledge and application of transformers from the natural language domain to time series, three model-specific experiments were conducted. Also, to enable the model awareness of the initial stages of the machine life and its degradation path, a novel expanding window method was proposed for the first time in this work, it was compared with the sliding window method, and it led to a large improvement in the performance of the encoder transformer model. Finally, the performance of the proposed encoder-transformer model was evaluated on the test dataset and compared with the results from 13 other state-of-the-art (SOTA) models in the literature and it outperformed them all with an average performance increase of 137.65% over the next best model across all the datasets.
A Transformer-based Framework for POI-level Social Post Geolocation
Li, Menglin, Lim, Kwan Hui, Guo, Teng, Liu, Junhua
POI-level geo-information of social posts is critical to many location-based applications and services. However, the multi-modality, complexity and diverse nature of social media data and their platforms limit the performance of inferring such fine-grained locations and their subsequent applications. To address this issue, we present a transformer-based general framework, which builds upon pre-trained language models and considers non-textual data, for social post geolocation at the POI level. To this end, inputs are categorized to handle different social data, and an optimal combination strategy is provided for feature representations. Moreover, a uniform representation of hierarchy is proposed to learn temporal information, and a concatenated version of encodings is employed to capture feature-wise positions better. Experimental results on various social datasets demonstrate that three variants of our proposed framework outperform multiple state-of-art baselines by a large margin in terms of accuracy and distance error metrics.